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Amendments 



In the Specification: 

Please amend the paragraph between the heading "Brief Description of the Drawings" 
and the heading "Detailed Description of the Invention" as follows. Amendments are shown 
with additions underlined and deletions in strik e through . 




FIG. 1 is a control flow diagram according to flowchart illustrating one embodiment of 
the invention. 

As indicated below, please delete the paragraphs added between the paragraph ending on 
page 9, line 3 and the subheading beginning on page 9, line 4, which were added via the 
Amendment filed on December 29, 2003. Amendments are shown with additions underlined and 
deletions in strik e through . 

As discuss e d abov e , th e first st e p of th e classifying m e thod is to calculat e an Obj e ct 
v e ctor, i.e., an ord e r e d s e t of a small numb e r of data points or scalars (b e tw ee n 4 and 100, mor e 
typically b e tw ee n 5 and 30) that is d e riv e d from th e data str e am, FIG. 1,110 associat e d with the 
Obj e ct to b e classified. — Th e transformation of the data st e am into an Obj e ct vector is t e r med 
"abstraction," FIG. 1, 120. Th e most simple abstraction proc e ss is to s e l e ct a number of points of 
th e data str e am. Howev e r, in principl e the abstraction proc e ss can b e p e rform e d on any function 
of th e data str e am. In th e e mbodim e nts pr e s e nt e d b e low abstraction is p e rform e d by selection of 
a small number of sp e cific intensiti e s from the data stream. 
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In on e e mbodim e nt, th e s e cond st e p of th e classifying m e thod is to det e rmin e in which 
data clust e r, if any, th e v e ctor r e sts. FIG. 1, 130. Data clust e rs ar e math e matical constructs that 
are th e multidim e nsional equival e nts of non ov e rlapping "hyp e rsph e r e s" of fixed size in th e 
vector spac e . — Th e location and associat e d classification or "status" of e ach data clust e r is 
determin e d by th e l e arning algorithm from th e training data s e t. Th e extent or siz e of e ach data 
cluster and th e numb e r of dim e nsions of th e v e ctor spac e is set a s a matt e r of routin e 
experim e ntation by th e op e rator prior to the op e ration of th e l e arning algorithm. If the vector 
li e s within a known data clust e r, th e Obj e ct is giv e n th e classification associat e d with that 
clust e r. FIG. 1, 150. In th e most simpl e e mbodiments the numb e r of dim e n s ions of th e v e ctor 
spac e is equal to the numb e r of data points that is s e l e ct e d in th e abstraction proc e ss. 
Alt e rnativ e ly, how e v e r, e ach scalar of th e Obj e ct v e ctor can b e calculat e d using multipl e data 
points of th e data str e am. — If the Obj e ct v e ctor r e sts outsid e of any known clust e r, a classification 
can be mad e of atypia, or atypical sampl e . FIG. 1, 110. 
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After the paragraph that ends on page 18, line 22 and before the ABSTRACT, please 
insert the following paragraphs. 

E. Summary of One Embodiment of the Invention 

FIG. 1 is a control flow diagram showing the top level processing of the knowledge 
discovery engine. Processing beings at step 302 and immediately continues to step 304. In step 
304, the KDE 202 processes the chromosome strings 204 using a genetic algorithm. The 
chromosome strings 204 comprise data strings that are to be analyzed. The genetic algorithm 
inputs the chromosome strings 204 and for each data string, identifies the chromosome variables 
contained within the chromosome string 204. The chromosome variables 208 define the 
variables that the KDE 202 will look for in each chromosome string 204. 

The KDE 202 continues to step 306 and creates a lead cluster map, or grouping, for each 
processed chromosome string by using a pre-defined set of variables. The lead cluster map 
establishes clusters of data records around centroids in high order dimensional space. The 
membership of a record to a cluster is determined by Euclidean distance. If the Euclidean 
distance between a centroid and the record places the record inside a decision hyper-radius, the 
record belongs to the cluster surrounding the centroid. If the Euclidean distance between the 
record and any existing centroid is greater than the decision hyper-radius, the record establishes a 
new centroid and a new cluster. All data regarding the lead cluster mapping of the processed 
chromosome strings is recorded in the string/cluster database 310. 

The KDE 202 continues to step 308 wherein for each lead cluster map, it computes a 
variance across all of the clusters contained within that lead cluster map and records the variance 
in the string/cluster database 310. This step determines how homogeneous a given chromosome 
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string 204 is to a predefined set of chromosome variables. The means for determining cluster 
homogeneity is a statistical measure of the variability of records belonging to a cluster with 
respect to specific behaviors, outcomes, attributes or the like. In the preferred embodiment, 
variance is used as the measure of homogeneity, but this is for convenience. It would be readily 
apparent to one of ordinary skill in the relevant art to use any statistical measure. 

Upon completion of step 308, the KDE 202 determines a best lead cluster map; that is, it 
determines which lead cluster map is the "best fit" with the given sets of chromosome variables. 

The KDE 202 continues to step 314 to determine whether the best lead cluster map is less 
than an acceptable minimum. The acceptable minimum may either be input by the user, or pre- 
defined within the KDE 202. 

If step 314 determines that the best lead cluster map is less than the acceptable minimum, 
then processing proceeds to step 316. In step 316, the KDE 202 records its final mapping in a 
chromosome map 210 and displays the best lead cluster map along with the matching variables. 

Returning to step 3 14, if the KDE 202 determines that the best lead cluster map is not less 
than the acceptable minimum, the KDE 202 proceeds to step 312. 

In step 312, the KDE 202 re-processes each processed chromosome string using the 
genetic algorithm. The genetic algorithm inputs the data for each processed chromosome string 
from the string/cluster database 310 and reanalyzes them according to the last set of information. 
After completing the re-ranking of the processed chromosome strings, the KDE 202 returns to 
step 306 to create new lead cluster maps for each processed chromosome string. The processing 
continues as described above. 



